Crash Course in SQL

If you want to work with databases, you must learn to speak their language. Databases speak Structured Query Language, better known as SQL, which was invented by E. F. Codd in the 1970s. Instead of working with tables one record at a time, SQL manages groups of records as a single entity, which makes it suitable for creating queries of any complexity. This language has been standardized, and now most database servers, and ADO itself, accept its ANSI-92 dialect.

SQL encompasses two distinct categories of statements: data definition language (DDL) and data manipulation language (DML). The DDL subset includes a group of statements that allow you to create database structures, such as tables, fields, indices, and so on. The DML subset includes all the commands that allow you to query and modify the data in the database, add new records, or delete existing ones. While both subsets are equally important, most of the time you'll use only DML statements to retrieve and update data stored in a database whose structure has been already defined in advance (possibly by another developer or the database administrator). For this reason, I focus in this section exclusively on the DML subset of the language. You need the information contained in this section to build queries that can't be created interactively using the SQL Query Builder (which permits you to create only the simplest queries).

Countless books have been written about SQL, so it's impossible to tell you anything new in a few pages. I'm writing this section only to enable those of you who've never worked with databases to understand the SQL queries that are used in the rest of the book. If you're already proficient with SQL, you can jump to the next chapter without any hesitation.

Most of the examples in the following sections are based on the Biblio.mdb or the NWind.mdb databases. On the companion CD, you'll find a sample application (shown in Figure 8-27) that lets you practice with SQL and immediately see the results of your query. You can recall previous queries using the less-than and greater-than buttons, and you can also safely perform action queries that delete, insert, or modify records because all your operations are wrapped in a transaction that's rolled back when you close the form.

Figure 8-27. The SQL Training sample application.

The SELECT Command

The most frequently used SQL statement is undoubtedly the SELECT command, which returns a set of records based on selection criteria.

Basic selections

The simplest SELECT command returns all the records and all the fields from a database table:

SELECT * FROM Publishers

(In all the examples in this section, I use uppercase characters for SQL keywords, but you can write them in lowercase characters if you like.) You can refine a SELECT command by specifying the list of fields you want to retrieve. If the field name includes spaces or other symbols, you must enclose it within square brackets:

SELECT PubID, [Company Name], Address FROM Publishers

You can often speed up a query by retrieving only the fields you're actually going to use in your application. SQL supports simple expressions in the field list portion of a SELECT. For example, you determine the age of each author at the turn of the century using the following command:

SELECT Author, 2000-[Year Born] AS Age FROM Authors

Notice how the AS clause lets you assign a well-defined name to a calculated field, which would otherwise be labeled with a generic name such as Expr1001. You can also use aggregate functions such as COUNT, MIN, MAX, SUM, and AVG on the table field, as in this code:

SELECT COUNT(*) AS AuthorCnt, AVG(2000-[Year Born]) AS AvgAge FROM Authors

This statement returns only one record with two fields: AuthorCnt is the number of records in the Authors table, and AvgAge is the average of the age of all authors in the year 2000.

Aggregate functions generally consider only non-Null values in the database. For example, you can learn the number of authors for which the average has been evaluated using this statement:

SELECT COUNT([Year Born]) FROM Authors

The Count(*) syntax is an exception to the general rule in that it returns the number of all the records in the result. In a real application, you rarely retrieve all the records from a table, though, for a good reason: If the table contains thousands of records, you're going to add too much overhead to your system and the network. You filter a subset of the records in the table using the WHERE clause. For example, you might want to retrieve the names of all the publishers in California:

SELECT Name, City FROM Publishers WHERE State = 'CA'

You can also combine multiple conditions using AND and OR Boolean operators, as in the following query, which retrieves only the Californian publishers whose names begin with the letter M:

SELECT * FROM Publishers WHERE State = 'CA' AND Name LIKE 'M%'

In the WHERE clause, you can use all the comparison operators (=, <, <=, >, >=, and <>) and the LIKE, BETWEEN, and IN operators. The BETWEEN operator is used to select all the values in a range:

SELECT * FROM Titles WHERE [Year Published] BETWEEN 1996 AND 1998

The IN operator is useful when you have a list of values. The query below returns all the publishers located in California, Texas, and New Jersey.

SELECT Name, State FROM Publishers WHERE State IN ('CA', 'TX', 'NJ')

SQL lets you embed strings in single quotes or in double quotes. Because you usually pass these statements as Visual Basic strings, using single quotes is often more practical. But when the string contains a single quote, you must use two consecutive single quotes. For example, to search for an author named O'Hara, you must use the following query:

SELECT * FROM Authors WHERE Author = 'O''Hara'

Sorting and grouping

The ORDER BY clause lets you affect the order in which records are retrieved. For example, you can display publishers in alphabetical order:

SELECT * FROM Publishers ORDER BY [Company Name]

You can also specify multiple sort keys by separating the keys with commas. Furthermore, for each sort key you can add the DESC keyword to sort in descending order. For example, you can list publishers sorted by state in ascending order and at the same time list all the publishers in the same state by city in descending order, as you can see in the following statement. (This is, admittedly, not a really useful thing to do, but it works as an example.)

SELECT * FROM Publishers ORDER BY State, City DESC

When the results are sorted, you can decide to take just the first records returned by the SELECT, which you do with the TOP clause. For example, you can retrieve the five titles published more recently using the following query:

SELECT TOP 5 * FROM Titles ORDER BY [Year Published] DESC

Keep in mind, however, that the TOP clause always returns all the records with a given value in the field for which the results are sorted. For example, the version of the Biblio.mdb database I'm working with includes seven titles published in the most recent year (1999), and therefore the preceding query will return seven records, not five.

You can define the number of returned records in terms of the percentage of the total number of the records that would be returned, using the TOP PERCENT clause:

SELECT TOP 10 PERCENT * FROM Titles ORDER BY [Year Published] DESC

The GROUP BY clause lets you create summary records that include aggregate values from groups of other records. For example, you can create a report with the number of titles published in each year by using the following query:

SELECT [Year Published], COUNT(*) As TitlesInYear FROM Titles GROUP BY [Year Published]

The next query displays the number of titles published in the last 10 years:

SELECT TOP 10 [Year Published], COUNT(*) As TitlesInYear FROM Titles GROUP BY [Year Published] ORDER BY [Year Published] DESC

You can prepare more sophisticated groupings with the HAVING clause. This clause is similar to the WHERE clause, but it acts on the fields produced by the GROUP BY clause and is often followed by an aggregate expression. The next query is similar to the previous one but returns a record only for those years when more than 50 titles have been published:

SELECT [Year Published], COUNT(*) As TitlesInYear FROM Titles GROUP BY [Year Published] HAVING COUNT(*) > 50

You can have both a WHERE clause and a HAVING clause in the same query. If you use both, SQL applies the WHERE clause first to filter the records from the original table. The GROUP BY clause then creates the groups, and finally the HAVING clause filters out the grouped records that don't meet the condition it specifies.

The following statement returns the names of all the cities where there's at least one publisher:

SELECT City FROM Publishers

This query presents some problems in that it also returns records for which the City field is Null, and, in addition, it returns duplicate values when there's more than one publisher in a city. You can solve the first problem by testing the field with the ISNULL function, and you can filter out duplicates using the DISTINCT keyword:

SELECT DISTINCT City FROM Publishers WHERE NOT ISNULL(City)

Subqueries

All the examples I've shown you so far retrieved their records from just one table. Most of the time, however, the data you're interested in is distributed in multiple tables. For example, to print a list of titles and their publishers you must access both the Titles and the Publishers tables. You do this easily by specifying both table names in the FROM clause and then setting a suitable WHERE clause:

SELECT Titles.Title, Publishers.Name FROM Titles, Publishers WHERE Titles.PubID = Publishers.PubID

You use the tablename.fieldname syntax to avoid ambiguities when the two tables have fields with the same name. Here's another example, which retrieves all the titles published by a given publisher. You need to specify both tables in the FROM clause, even if the returned fields come only from the Titles table:

SELECT Titles.* FROM Titles, Publishers WHERE Publishers.Name = 'MACMILLAN'

But you can use another—and often more efficient—way to retrieve the same information, based on the fact that the SELECT statement can return a value, which you can use to the left of the = operator. The next query uses a nested SELECT query to get a list of all the titles from a given publisher:

SELECT * FROM Titles WHERE PubID = (SELECT PubID FROM Publishers WHERE Name = 'MACMILLAN')

If you aren't sure whether the subquery returns only one record, use the IN operator instead of the equal sign. You can make these subqueries as complex as you like. For example, the following query returns the titles published by all the publishers from California, Texas, and New Jersey:

SELECT * FROM Titles WHERE PubId IN (SELECT PubID FROM Publishers WHERE State IN ('CA', 'TX', 'NJ'))

You can also use aggregate functions, such as SUM or AVG. The next query returns all the items from the Orders table in NWind.mdb for which the Freight value is higher than the average Freight value:

SELECT * FROM Orders WHERE Freight > (SELECT AVG(Freight) FROM Orders)

Joins

The join operation is used to retrieve data from two tables that are related to each other through a common field. Conceptually, the result of the join is a new table whose rows consist of some or all the fields from the first table followed by some or all the fields from the second table; the expression in the ON clause in a JOIN command determines which rows from the second table will match a given row from the first table. For example, the following query returns information about all titles, including the name of their publisher. I already showed that you can complete this task using a SELECT command with multiple tables, but an INNER JOIN command is often better:

SELECT Titles.Title, Titles.[Year Published], Publishers.Name FROM Titles INNER JOIN Publishers ON Titles.PubID = Publishers.PubID

This is an important detail: the previous statement retrieves only those titles for which there is a publisher, that is, those whose PubID field isn't Null. While the INNER JOIN (also known as equi-join) is the most common form of join operation, SQL also supports two other types of joins, the LEFT JOIN and the RIGHT JOIN operations. The LEFT JOIN operation retrieves all the records in the first table, regardless of whether there's a corresponding record in the other table. For example, the following command retrieves all the titles, even if their publisher isn't known:

SELECT Titles.Title, Titles.[Year Published], Publishers.Name FROM Titles LEFT JOIN Publishers ON Titles.PubID = Publishers.PubID

The RIGHT JOIN operation retrieves all the records in the second table, even if there isn't a related record in the first table. The following statement selects all the publishers, whether or not they have published any titles:

SELECT Titles.Title, Titles.[Year Published], Publishers.Name FROM Titles RIGHT JOIN Publishers ON Titles.PubID = Publishers.PubID

Join operations can be nested. Here's an example that retrieves information about all authors and the books they've written. The two tables are related through the intermediate table Title Author, so we need a nested INNER JOIN operation:

SELECT Author, Title, [Year Published] FROM Authors INNER JOIN ([Title Author] INNER JOIN Titles ON [Title Author].ISBN = Titles.ISBN) ON Authors.Au_Id = [Title Author].Au_ID

Of course, you can filter records using a WHERE clause in both the nested and the external join. For example, you can return only titles published before 1960:

SELECT Author, Title, [Year Published] FROM Authors INNER JOIN ([Title Author] LEFT JOIN Titles ON [Title Author].ISBN = Titles.ISBN) ON Authors.Au_Id = [Title Author].Au_ID WHERE [Year Published] < 1960

Unions

You can append results to a SELECT command using the UNION keyword. Say that you want to send your Christmas greetings to all your customers and suppliers. You can retrieve their names and addresses using the following query:

SELECT Name, Address, City FROM Customers UNION SELECT CompanyName, Address, City FROM Suppliers

The two tables can have different structures, provided that the fields returned by each SELECT command are of the same type.

The INSERT INTO Command

The INSERT INTO command adds a new record to a table and sets its fields in one operation. You must provide a list of field names and values, as in the following statement:

INSERT INTO Authors (Author, [Year Born]) VALUES ('Frank Whale', 1960)

If the table has a key field that's automatically generated by the database engine—as is true for the Au_Id field in the Authors table—you don't have to include it in the field list. Null values are inserted in all the columns that you omit from the field list and that aren't automatically generated by the database engine. If you want to insert data that's already stored in another table, you can append a SELECT command to the INSERT INTO statement. For example, the following command copies all the records from a table called NewAuthors into the Authors table:

INSERT INTO Authors SELECT * FROM NewAuthors

You often need a WHERE clause to limit the number of records that are inserted. You can copy from tables with a different structure or with different name fields, but in this case you need to use aliases to make field names match. The statement which follows copies an entry from the Contact table to the Customer table but accounts for different field names.

INSERT INTO Customers SELECT ContactName AS Name, Address, City, State FROM Contacts WHERE Successful = True

The UPDATE Command

The UPDATE command modifies the values in one or more records. You often use a WHERE clause to restrict its action to the record(s) you're interested in:

UPDATE Authors SET [Year Born] = 1961 WHERE Author = 'Frank Whale'

You can also use expressions in SET clauses. For example, the following statement increments the discount for all the items in the Order Details table that have been ordered by customer LILAS. (Run this query against the NWind.mdb database.)

UPDATE [Order Details] INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID SET Discount = Discount + 0.10 WHERE CustomerID = 'LILAS'

The DELETE Command

The DELETE command lets you remove one or more records from a table. You must append a WHERE clause to this command unless you want to delete all the records in the table. For example, the following command deletes all the titles that were published before 1950:

DELETE FROM Titles WHERE [Year Published] < 1950

A DELETE operation can fire cascading delete operations in other tables if an integrity relationship is enforced between the two tables. For example, you can delete a record in the Orders table in NWind.mdb and the Jet engine automatically deletes all the related records in the Order Details table. In general, however, you can't delete a record in a table if a foreign key in another table points to it. For example, you can't delete a record in the Employees table until you delete the records in the Orders table whose EmployeeID values are pointing to the record you want to delete. You can do the latter operation using an INNER JOIN clause in the DELETE command. Keep in mind that when multiple tables are involved, you need to specify which one you want to delete records from immediately after the DELETE command:

DELETE Orders.* FROM Orders INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID WHERE Employees.LastName = 'King'

After that, you can delete the records in the Employees table:

DELETE FROM Employees WHERE Employees.LastName = 'King'

This chapter concludes the first part of this book, which has covered all the basic concepts you need before diving into more complex programming issues. We'll go back to database programming in Chapter 13, but in the meantime let's see how you can take advantage of the ActiveX controls that come in the Visual Basic package.